Here we will implement a code to fit a distribution to a give data, where we will using techniques from non-linear equation solving which is used to find MLEs.

OBJECTIVE

Fit a Gamma Distribution to the given data set using Maximum Likelihood Estimators. The first few values of Data looks like this -

x
0.6482746
0.3531422
1.0052057
1.1744683
0.0224576
0.2909537

LIKELIHOOD FUNCTIONS

The density of the Gamma distribution is -

\[ f(x;p,a)\ =\ \frac{a^p}{\Gamma(p)}.x^{p-1}.e^{-ax}\ ;where\ x,\ p,\ a>0 \]

The Likelihood Function is -

\[ L(p,a)\ =\ \prod_{i=1}^n f(x;p,a) \]

where, \(x_1,\dots, x_n\) are the data in the file.

Now, the Likelihood Function when evaluated results in really small values which results in numerically improper results. So, we will instead use Log-Likehood and it’s derivatives.

\[ \frac{\partial \log(L)}{\partial p}\ =\ n.\log(a)\ +\ \log(\prod_{i=1}^n x_i) - n.\digamma(p) \]

\[ \frac{\partial \log(L)}{\partial a}\ =\ n.\frac{p}{a}\ -\ \sum_{i=1}^n x_i \]

The Jacobian Matrix for the above System of 2 Equations in 2 variables is -

\[ J\ =\ \begin{bmatrix} -n.\frac{d^2\log(\Gamma(p))}{dp^2}&\frac{n}{a} \\ \frac{n}{a}&-n.\frac{p}{a^2} \\ \end{bmatrix} \]

We now have all the tools necessary to implement the Newton-Raphson Method.

FITTING THE DISTRIBUTION - CODE & RESULTS

Let’s First have a look at the Histogram of the data -

data1 <- tibble(x = datalist)

data1 %>%
  ggplot(aes(x)) +
  geom_histogram(bins = 20) +
  theme_bw() +
  ggtitle("Histgram of the Data")

Let’s implement the Newton-Raphson’s Method to find the roots of the Log-Likelihood Function.

L.dash = function(x){
  p = x[1]
  a = x[2]

  dellogL.delp = n*log(a) + log(product) - n*digamma(p)
  dellogL.dela = (n*p/a) - summation
  
  return(c(dellogL.delp, dellogL.dela))
}

J = function(x) {
  p = x[1]
  a = x[2]

  vec = c(-n*trigamma(p), n/a, n/a, -n*p/a^2)
  J <- matrix(data = vec, ncol = 2)
  return(J)
}
NR2 = function(f, d, x0, n){
  vals <- tibble(p = rep(0, n), a = rep(0, n))
  for (i in 1:n) {
    ( x0 = x0 - solve(d(x0), f(x0)) )
    vals[i, 1] <- x0[1]
    vals[i, 2] <- x0[2]
  }
  return(vals)
}

Now, Running the above function.

kable(tail(NR2(L.dash, J, c(1,1), 10)), align = 'l')
p a
1.946418 2.878889
1.946419 2.878890
1.946419 2.878890
1.946419 2.878890
1.946419 2.878890
1.946419 2.878890

It seems our Function found a root at \(p=1.946,\ a=2.878 (approx.)\).

Let’s overlay the Gamma Distribution with these parameters on the Histogram to see how good the fit is.

Animated NR Iteration Fitting

tibble(x = datalist) %>%
  ggplot(aes(x, ..density..)) +
  geom_histogram(bins = 30) +
  geom_line(data = tibble(x = seq(min(datalist, na.rm = TRUE), max(datalist, na.rm = TRUE), 0.01), fit = dgamma(seq(min(datalist, na.rm = TRUE), max(datalist, na.rm = TRUE), 0.01), 1.946419, 2.878890)), aes(x, fit), size = 2, color = "blue") +
  theme_bw() +
  ggtitle("Gamma Distribution Fitted")

It seems like a decent fit visually!